Skip to content
This repository has been archived by the owner on Sep 24, 2024. It is now read-only.

Update dataset generation and storage for ragas and prometheus #94

Merged
merged 8 commits into from
Apr 8, 2024

Conversation

sfriedowitz
Copy link
Contributor

@sfriedowitz sfriedowitz commented Apr 4, 2024

What's changing

  • Unifies how datasets are being stored between ragas and prometheus
  • Imports loguru for logging. We just need to be careful not to accidentally send the logger across Ray remote boundaries, as that would incur serialization issues
  • Adds a default storage path for lm-buddy. We can discuss this, and get rid of it if we don't feel its right.

How to test it

Related Jira Ticket

Additional notes for reviewers

@sfriedowitz sfriedowitz marked this pull request as ready for review April 4, 2024 21:58
@sfriedowitz sfriedowitz requested review from imtihan and aittalam April 4, 2024 21:58
src/lm_buddy/constants.py Outdated Show resolved Hide resolved
Copy link
Contributor

@imtihan imtihan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the Ragas stuff! I'd wait for Davide to speak on prometheus before merging :)

Copy link
Member

@aittalam aittalam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, Sean!
I think both the common storage path and the new dataset generation are nice features to have.
My only concern is what happens when the process breaks and if/how data can be recovered, but I saw Dataset.from_generator still saves incrementally by default and we can think about recovery after doing some tests with it

src/lm_buddy/constants.py Outdated Show resolved Hide resolved
src/lm_buddy/jobs/evaluation/prometheus.py Show resolved Hide resolved
src/lm_buddy/jobs/evaluation/prometheus.py Show resolved Hide resolved
@sfriedowitz sfriedowitz merged commit 706ac29 into main Apr 8, 2024
4 checks passed
@sfriedowitz sfriedowitz deleted the sfriedowitz/fix-dataset-storage branch April 8, 2024 15:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants